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WHAT  DISTINGUISHES  AN  INDEPENPENTLY  OBSERVED  VECTOR 
FROM  AN  ESTIMATED  MULTIVARIATE  NORMAL  POPULATION? 


By 

A.  C.  BITTNER,  JR. 


SUMMARY 

Once  a  researcher  has  determined  that  a  multivariate  observation  is  different  from  an 
estimated  population,  he  still  has  an  unanswered  question.  He  wants  to  know,  “How  is  it 
different?”  A  method  for  answering  this  question  is  considered  in  this  report. 

Two  results  are  shown  and  both  involve  the  estimated  population  parameters  x-the  esti¬ 
mated  mean,  an  il-the  estimated  covariance  matrix.  The  first  result  answers  the  question  “Which 
linear  combinations  of  the  elements  of  the  difference  2^  -  x  are  significant?”  The  second 
answers  the  question  “Which  elements  of  the  difference  x^^  -  x  are  significant?”  Both  results 
are  derived  using  S.  N.  Roy’s  Union-Intersection  Principle;  hence,  one  can  set  an  overall  q/signifi- 
cance/type-1  level  for  the  totality  of  tests. 

A  brief  discussion  of  an  application  is  also  presented. 

Publication  Unclassified. 


Approved  for  pubifjc  release;  distributior)  unlimited. 


GLOSSARY 


Xq  a  p-component  (p  by  1)  vector  observation,  “o”,  independent  of  the  vectors 

n  The  number  of  independent  observations  which  are  independent  of  the  vector 

-  1  ^ 

X  =  ~  ^Xj  A  p-component  (p  by  1)  estimate  of  the  population  mean  vector  ju 
i=l 


A 


a 

H, 


-1 


Hi 

m,  ?) 


m 

t 

c 


A  p-variate  (p  by  1)  population  mean  vector 
A  p  by  p  population  covariance  matrix 

An  unbiased  estimate  of  the  population  covariance  matrix  based  on  m  degrees 
of  freedom 

A 

The  inverse  of  the  matrix  ? 

The  probability  that  the  (null)  hypothesis  will  be  rejected  when  it  is  true 

The  hypothesis  that  the  vector  x^  and  the  set  of  vectors  Xj,  X2>  •  •  ■  > 
are  from  the  same  population 

The  negation  of  the  hypothesis  Hq 

The  multivariate  normal  population  with  parameters  and  ? 

A 

The  number  of  degrees  of  freedom  of  the  matrix  estimate  ? 

A  “students”  t-test  statistic 
A  (1  by  p)  p-variate  vector  of  constant  coefficients 


£ 

-i 

dT\) 

d(c) 


0 


D 


The  p  by  I  transpose  of  the  vector  c 

The  (1  by  p)  p-variate  vector  which  maximizes  the  function  T  (c) 

A  1  by  p  vector  which  has  a  1  as  the  i-th  element  and  zeros  elsewhere 

A  p-variate  (p  by  1)  vector  of  partial  derivatives  whose  i-th  component  is 

a(Ci) 

A  p  by  I  vector  which  has  zeros  for  all  the  elements 
An  unbiased  estimate  of  the  i-th  variate  from  N(f/,  ?) 

A  matrix  conformal  with  x 
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INTRODUCTION 

In  a  previous  report  by  the  author  (reference  1),  a  probability  density/distribution  function 
was  developed  for  an  observed  (p  by  1)  vector  (x^)  from  a  multivariate  normal  distribution  with 
estimated  parameters  (see  A-1)?  This  result  enables  a  researcher  to  ask  the  general  question 
“How  unlikely  is  it  that  the  observation  arose  from  N(/i ,  ?)  where  fr  is  estimated  by  x  and 
?is  estimated  by  ??”  Specifically  the  statistic: 

=iiTT(Xo (1) 

mp 

which  is  distributed  as  - F  with  p  and  m-p+1  degrees  of  freedom  (df),  can  be  compared 

m  -  p+1 

with  tables  of  the  F  distribution  for  probability  of  occurrence.  If  the  probability  of  occurrence 
is  less  than  some  specified  amount  (a),  then  the  hypothesis  (Hq)  that  x^  is  from  the  same  popu¬ 
lation  that  generated  x  could  be  rejected.  In  other  words,  the  hypothesis  (Hj )  that  the  popu¬ 
lations  which  generated  x^  and  x  are  not  the  same  wouli^be  accepted. 


The  rejection  of  the  hypothesis  that  x^,  and  x  are  from  the  same  population  (H^)  doesn’t 
show  which  of  the  variates  of  x^  were  significantly  different.  Although  individual  tests  of  sig¬ 
nificance  could  be  constructed  (e.g.,  t-tests),  there  is  no  control  of  the  overall  a  level  for  the  set 
of  comparisons.  The  reasons  for  this  are  twofold;  there  are  p  such  tests,  and  the  variates  are 
generally  correlated.  Hence  an  approach  is  needed  for  testing  the  individual  variates  of  x^  while 
controlling  the  overall  a-level.  The  purpose  of  the  present  development  is  to  delineate  such  a 
procedure. 


*The  results  A-1  and  A-2  are  working  theorems  which  are  given  in  the  appendix. 
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APPROACH 


The  approach  employed  here  uses  S.N.  Roy’s  Union-Intersection  Principle  (reference  2, 
and  reference  3).  This  principle  allows  one  to  fix  the  overall  significance  level  for  the 
totality  of  tests  of  linear  compounds  of  the  difference  x  -  x.  Operationally,  this  is  accom¬ 
plished  by  employing  the  same  (a-level)  criterion  required  for  the  test  of  the  most  unlikely 
linear  compound,  c(x^  -  x),  to  all  particular  tests  of  linear  compounds.  Since  the  individual 
variates  x^j  -  x^  (i=l ,  .  .  .  ,  p)  can  be  tested  by  the  linear  compounds  ej(x^  -  x)  (i=l , . . . ,  p), 
this  procedure  will  yield  a  solution  to  the  problem  posed  in  the  Introduction.* 

Let  us  define  the  statistic  T^(c)  as  follows: 

=  -^)]  (2) 

where  ^  is  a  fixed  1  by  p  vector.  This  equation  is  a  special  case  of  equation  (1)  with  the  p-vari- 
ate  terms  x^,x,  and  £  replaced  by  the  corresponding  univariate  terms  cx^^,  cx,  and  c  2  c*.  These 
univariate  terms  are  those  appropriate  for  linear  compounds  of  the  respective  variates  (see  A-2). 
Under  the  hypothesis  (Hq)  that  x^  is  from  the  same  population  that  generated  x,  equation  (2) 
is  distributed  as  F  with  1  and  m  degrees  of  freedom.  Hence,  the  most  unlikely  t2(c)  value 
would  occur  for  that£  which  maximizes  (2). 

DERIVATIONS 

In  the  following,  a  theorem  will  be  stated  which  contains  both  the  conditions  for  maxi- 

•y 

mizing  T  (c),  and  a  procedure  for  testing  the  significance  of  the  totality  of  linear  compounds 
with  a  fixed  overall  significance  level.  Consider  the  following: 

Theorem  1.0.  If  T  (c)  is  defined  as  in  equation  (2),  then  its  maximum  value  is 

=  (3) 

mp 

which  is  distributed  as -  F  with  p  and  m-p+1  degrees  of  freedom  when  is  true.  This 

m-p+1 

value  of  T‘‘(c)  is  obtained  for 

c  =  cx„  -  xyr’  (4) 


*£i  P)  is  a  1  by  p  vector  with  a  1  in  the  i-th  entry  and  zeros  elsewhere. 
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(iO) 


There  will  be  one  nonzero  eigenvalue  of  (10)  since  the  rank  of  (x^  -  x)  (x^^  -  x)*  is  one 
and  the  product  of  it  and  any  conformal  nonsingular  matrix  (e.g.,  would  have  the  same 
rank.  Since  the  trace  (tr)  of  (10)  equals  the  sum  of  its  eigenvalues  (of  which  there  is  exactly 
one,  T^(c),  that  is  nonzero),  it  follows  that 

=  tr  ?■'  0 1) 

This  can,  by  the  commutative  laws  for  traces,  be  rewritten 

t'©  •  (l^)  « (fe.  -  -  s)  (12) 

'  -  i)'  ^  '(ilo  -  3  (12) 

which  is  the  first  result  (3)  ofTheorem  1.0.*  The  distribution  of  (3)  or  (13)  is  given  by  A-1; 
hence,  the  first  result  of  the  theorem  is  completed. 

The  second  result  of  this  theorem  ^  follows  upon  substitution  of  the  value  of  c  given  in 
equation  (4)  for  c  in  equation  (6).  This  yields  a  T  (c)  value  of 


or  equivalently 


(li^)  (^o  - 


X)’  2''(Xo  -X) 


Because  this  corresponds  to  the  maximum  value  of  T^(c),  equation  (5)  gives  the  desired  vector. 


The  last  portion  of  the  theorem  can  be  seen  when  one  recalls  the  nature  of  the  Union- 
Intersection  principle.  Specifically,  by  employing  the  significance  criterion  (a  -  level)  of  T  (c) 
for  each  test  of  the  type  T  (c),  one  is  assured  that  the  totality  of  such  tests  has  a  joint  signifi¬ 


cance  level  of  a.  Since 


111^ 

-  .vii  is  this  criteria,  as  indicated  by  the  first  part  of  the 

-  p+1  a  P^-P+i  ■'  ^ 


theorem  and  A-1 ,  the  general  test  (5)  follows  directly. 


♦See  reference  4  for  discussion  of  the  rank  of  the  product  of  two  matrices. 

♦♦Reference  4  also  contains  a  discussion  of  various  results  concerning  the  traces  of  matrices. 
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Tests  for  the  individual  components  can  be  derived  from  specialization  of  equation  (5). 
Since  testing  a  specific  component  for  significance  (i-th)  is  equivalent  to  testing  T  (e  j)  for 
significance,  one  could  substitute  e^j  for  £  in  equation  (5)  and  obtain  a  specialized  result.  This 
will  be  considered  in  the  remainder  of  this  section. 

Substitution  of  e^^  for  £  in  equation  (5)  yields 


§  m  -  p+1 
"o 


which  by  equation  (2)  is  equivalent  to 


Multiplying  out  each  of  the  bracketed  terms, 


m  -  p+1  “:P-"’-P+l 


where  x  •  is  the  i-th  entry  of  x„,  x.  is  the  i-th  entry  of  x,  and  ?;•  is  the  i,  i-th  entry  of 

Since  in  general  the  diagonal  entries  of  ?are  variance  estimates,  [? -P  can  be  written  as^j-  where 

“  s- 

Sj  is  the  estimated  variance  of  the  i-th  variate.  Thus  equation  (18)  can  be  written; 

/  n  mp  ^ 


(i^) 


s2  ^  m-p-tl  “^P’^n-P+l 

1  Hq 


which  is  a  considerable  simplification  of  the  test  equation  of  the  theorem.  This  result  is  sum¬ 
marized  in  the  following  corollary: 

Corollary^l.l  Let  x^j  be  the  i-th  entry  of  x^,  Xj  be  the  i-th  entry  of  x,  and  s^  be  the  i,  i-th 
entry  of  ? .  Then  the  set  of  p-variates  of  x^  can  be  individually  tested  for  difference  from  ^  by 

-  2 

/  n  p 

(n+l)  j2  -  m  -  p-tl  *^“-P-m-P+l 
i  Hq 


with  assurance  that  the  overall  significance  level  for  the  entire  set  of  p  tests  is  less  than  a. 
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DISCUSSION 


In  the  above,  two  results  were  derived  which  answered  the  question  “How  does  the  obser¬ 
vation  Xq  differ  from  the  population  which  generated  x?”  The  first  of  these,  Theorem  1.0, 
allows  one  to  ask  if  any  particular  linear  combination  of  variates  (cx  distinguishes  x^  from 
the  population  which  generated  x.  The  second  result.  Corollary  1.1,  allows  one  to  ask  if  any 
particular  variate  distinguishes  x^  from  the  x  generating  population.  Both  of  these  results  have 
obvious  practical  application.  Let  us  briefly  consider  one. 

In  a  multiple  criteria  experiment,  an  unforseen  (random)  event  occurs  which  makes  suspect 
a  single  observation  Xg-  The  first  question  which  faces  the  research  is,  “Is  x^  from  the  same 
population  as  the  set  of  other  observations  (xj,  X2  »  •  •  •  >  2fn)  fi'om  the  same  condition?” 

This  question  can  be  answered  by  employing  the  statistic: 

-H)*  ^  (21) 

—  ^  A  ^  —  —  I  7  (f'“Op  _ 

where  x  =  „  ^  x }>  ?  2^  (xj  “  x)  (Xj  “  2L)  .  T  is  distributed  as  -  F  with 

i=l  ■  i=l  "-P 

p  and  n-p  degrees  of  freedom.*  Given  that  this  statistic  (21)  is  significant,  the  next  question  is, 

—  ^ 

“Which  variates  of  x^  differ  from  the  population  which  generated  x  and  %  ?”  This  could  be 
answered  by  applying  Corollary  1.1  with  m  equaling  n-1.  Guided  by  the  costs  of  observations 
and  their  number,  the  results  of  these  tests  would  be  useful  for  decisions  regarding  the  inclusion 
of  x^  as  part  of  the  data  of  the  experiment. 

The  above  does  not  exhaust  the  set  of  possible  applications.  Hopefully  this  application 
will  suggest  others  to  the  reader. 
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APPENDIX 


TWO  WORKING  THEOREMS 

The  first  theorem  (A-1)  was  shown  in  reference  1  by  the  author.  The  second  theorem 
(A-2)  was  shown  by  Anderson  (reference  5)  in  1958. 


If  Xq  is  an  observed  p-variate  vector  from  N(^,  ?), 


i=l 


is  a  mean  vector  also  from  N(^,?)  based  on  n  independent  observations,  and  m?  is  the  sum 
the  matrix  products  of  m  independent  N(0,  '$)  p-variate  vectors  (Z  j,  Z2 , . . . ,  Z ;  i.e., 

m 

mi  =  E 

i=l 

then 

,  n  ♦  A  , 

is  distributed  as; 
mp 

- F 

m  -  p  +  1 

where  F  has  p  and  m  -  p  +  1  degrees  of  freedom. 
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A-2 

If  X  is  distributed  according  to  NQx,  ?),  then  Z  =  Dx  is  distributed  according  to  N(D/i, 
D?D*). 
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